444 Final Project - Housing Data

Author

Brodie Gilbert

Published

November 21, 2025

Part 3

Introduction

Our team is looking at various trends when it comes to housing data, we each did a different part using this housing data in different ways. My part of the project is about affordability of housing in different areas. I chose to use 3 main different places, with a bonus 4th place that I wanted to throw in at the end. The big 3 places are Los Angeles, a mega city, Nashville a large city and Oxford a medium to small city. First I will visualize my Zillow scraped data showing all the different places I scraped. Then I will examine the relationship of how cost effective it is to rent or buy in each of these cities with also examining the difference in price of each area being compared square footage amount of bedrooms and bathrooms to get a general idea on the price of the different areas and then I want to look into a small town around Nashville to see the difference in price being right outside a large city.

Note: The Zillow data is not all of the listings within each area. When scraping, Zillow blocked my attempts to grab all of the listings. Zillow after many attempts would only allow me to grab the first 9 listings per page because Zillow loads new listings once we scroll so when I am pulling from the HTML it cannot see those new listings.

Code
# Load Zillow data from each market
los_angeles <- read_csv("data/raw/los_angeles_ca_properties.csv", col_types = cols(.default = "c"))
nashville <- read_csv("data/raw/nashville_tn_properties.csv", col_types = cols(.default = "c"))
oxford <- read_csv("data/raw/oxford_ms_properties.csv", col_types = cols(.default = "c"))
gallatin <- read_csv("data/raw/gallatin_tn_properties.csv", col_types = cols(.default = "c"))

# Combine all markets
zillow_combined <- bind_rows(
  los_angeles %>% mutate(market = "Los Angeles, CA", city_size = "Mega City", state = "CA"),
  nashville %>% mutate(market = "Nashville, TN", city_size = "Large City", state = "TN"),
  oxford %>% mutate(market = "Oxford, MS", city_size = "Small City", state = "MS"),
  gallatin %>% mutate(market = "Gallatin, TN", city_size = "Small Town", state = "TN")
)

# Create market summary statistics
market_summary <- zillow_combined %>%
  mutate(
    price = as.numeric(gsub("[\\$,]", "", price)),
    sqft = as.numeric(gsub(",", "", sqft)),
    bedrooms = as.numeric(bedrooms),
    bathrooms = as.numeric(bathrooms),
    year_built = as.numeric(year_built),
    latitude = as.numeric(latitude),
    longitude = as.numeric(longitude),
    price_per_sqft = price / sqft
  ) %>%
  filter(!is.na(price), !is.na(sqft), price > 0, sqft > 0) %>%
  group_by(market, city_size, state) %>%
  summarize(
    total_listings = n(),
    avg_price = mean(price, na.rm = TRUE),
    median_price = median(price, na.rm = TRUE),
    min_price = min(price, na.rm = TRUE),
    max_price = max(price, na.rm = TRUE),
    avg_sqft = mean(sqft, na.rm = TRUE),
    avg_bedrooms = mean(bedrooms, na.rm = TRUE),
    avg_bathrooms = mean(bathrooms, na.rm = TRUE),
    avg_price_per_sqft = mean(price_per_sqft, na.rm = TRUE),
    .groups = "drop"
  )

# Clean dataset for property-level analysis
zillow_filtered <- zillow_combined %>%
  mutate(
    price = as.numeric(gsub("[\\$,]", "", price)),
    sqft = as.numeric(gsub(",", "", sqft)),
    bedrooms = as.numeric(bedrooms),
    bathrooms = as.numeric(bathrooms),
    year_built = as.numeric(year_built),
    latitude = as.numeric(latitude),
    longitude = as.numeric(longitude),
    price_per_sqft = price / sqft
  ) %>%
  filter(!is.na(price), !is.na(sqft), price > 0, sqft > 0)

Question 1: What is the average price of homes in each area? (Los Angeles, Nashville, and Oxford)

Before we can begin analyzing affordability across these three very different markets, we need to understand the basic price landscape. Los Angeles represents one of the most expensive housing markets in the entire country - a mega city with millions of residents competing for limited housing stock. Nashville represents a large, growing city that has seen significant population influx in recent years. Oxford, Mississippi is a small college town that offers a stark contrast to the other two markets.

Using the data I scraped from Zillow, here’s a summary of the average prices across all three markets:

Code
# Q1: Average prices by market
main_markets <- market_summary %>%
  filter(market %in% c("Los Angeles, CA", "Nashville, TN", "Oxford, MS")) %>%
  arrange(desc(avg_price))

main_markets %>%
  mutate(
    avg_price = dollar(avg_price),
    median_price = dollar(median_price),
    min_price = dollar(min_price),
    max_price = dollar(max_price),
    avg_sqft = comma(round(avg_sqft, 0)),
    avg_price_per_sqft = dollar(round(avg_price_per_sqft, 0))
  ) %>%
  select(market, city_size, total_listings, avg_price, median_price, avg_sqft, avg_price_per_sqft) %>%
  kbl(
    caption = "Average Home Prices by Market - Los Angeles, Nashville, and Oxford",
    col.names = c("Market", "City Size", "Listings", "Avg Price", "Median Price", "Avg Sq Ft", "Price/Sq Ft"),
    align = c("l", "l", "c", "r", "r", "r", "r")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE, position = "center") %>%
  row_spec(0, bold = TRUE, color = "white", background = "#2C3E50") %>%
  column_spec(4, bold = TRUE, color = "#E74C3C") %>%
  footnote(general = "Data scraped from Zillow, November 2025", general_title = "Source: ")
Average Home Prices by Market - Los Angeles, Nashville, and Oxford
Market City Size Listings Avg Price Median Price Avg Sq Ft Price/Sq Ft
Los Angeles, CA Mega City 204 $5,951,477 $1,324,000 3,540 $1,014
Nashville, TN Large City 205 $1,046,017 $530,000 2,373 $377
Oxford, MS Small City 183 $578,600 $488,500 2,337 $246
Source:
Data scraped from Zillow, November 2025

The results immediately confirm what most people would expect: Los Angeles has by far the highest average home prices, followed by Nashville, with Oxford being the most affordable. What’s particularly striking is the price per square foot - in Los Angeles you’re paying significantly more for the same amount of space compared to the other markets.

The difference in average price between Los Angeles and Oxford is substantial. This massive gap illustrates why many families are choosing to relocate from expensive coastal cities to more affordable areas in the South and Midwest.


Question 2: Where are these properties located geographically?

Now that we understand the price differences, I want to visualize where exactly these properties are located across the United States. This will help us see the geographic spread of our data and understand the regional context of each market.

Each property is plotted on the map with colors representing the different markets. You can click on any marker to see details about that property.

Code
# Q2: Interactive map of all properties
map_data <- zillow_filtered %>%
  filter(market %in% c("Los Angeles, CA", "Nashville, TN", "Oxford, MS")) %>%
  filter(!is.na(latitude), !is.na(longitude))

# Custom house icons for each market
market_icons <- awesomeIconList(
  "Los Angeles, CA" = makeAwesomeIcon(icon = "home", markerColor = "orange", library = "fa", iconColor = "white"),
  "Nashville, TN" = makeAwesomeIcon(icon = "home", markerColor = "blue", library = "fa", iconColor = "white"),
  "Oxford, MS" = makeAwesomeIcon(icon = "home", markerColor = "purple", library = "fa", iconColor = "white")
)

oxford_boundary <- places(state = "MS", year = 2022, progress_bar = FALSE) %>% filter(NAME == "Oxford")
nashville_boundary <- places(state = "TN", year = 2022, progress_bar = FALSE) %>% filter(NAME == "Nashville-Davidson metropolitan government (balance)")
la_boundary <- places(state = "CA", year = 2022, progress_bar = FALSE) %>% filter(NAME == "Los Angeles")

us_map <- leaflet(map_data) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  setView(lng = -89.5192, lat = 34.3665, zoom = 10) %>%
  addPolygons(data = oxford_boundary, fillColor = "#9B59B6", fillOpacity = 0.15, color = "#9B59B6", weight = 2) %>%
  addPolygons(data = nashville_boundary, fillColor = "#3498DB", fillOpacity = 0.15, color = "#3498DB", weight = 2) %>%
  addPolygons(data = la_boundary, fillColor = "#F39C12", fillOpacity = 0.15, color = "#F39C12", weight = 2) %>%
  addAwesomeMarkers(
    lng = ~longitude, lat = ~latitude, icon = ~market_icons[market],
    popup = ~paste0("<b>", market, "</b><br>", address, "<br><b>Price:</b> ", dollar(price),
                    "<br><b>Beds:</b> ", bedrooms, " | <b>Baths:</b> ", bathrooms, "<br><b>Sq Ft:</b> ", comma(sqft)),
    label = ~paste0(market, ": ", dollar(price)),
    clusterOptions = markerClusterOptions()
  ) %>%
  addLegend(position = "bottomright", colors = c("#F69730", "#38AADD", "#9B59B6"),
            labels = c("Los Angeles, CA", "Nashville, TN", "Oxford, MS"), title = "Markets", opacity = 0.8) %>%
  addEasyButton(easyButton(icon = htmltools::HTML("<b>LA</b>"), title = "Los Angeles",
    onClick = JS("function(btn, map){ map.setView([34.0522, -118.2437], 10); }"))) %>%
  addEasyButton(easyButton(icon = htmltools::HTML("<b>NSH</b>"), title = "Nashville",
    onClick = JS("function(btn, map){ map.setView([36.1627, -86.7816], 11); }"))) %>%
  addEasyButton(easyButton(icon = htmltools::HTML("<b>OXF</b>"), title = "Oxford",
    onClick = JS("function(btn, map){ map.setView([34.3665, -89.5192], 12); }")))

us_map

Looking at this map, we can see the geographic spread of our three markets. Los Angeles sits on the West Coast, Nashville is in the heart of Tennessee, and Oxford is nestled in northern Mississippi. The navigation buttons allow you to jump directly to each market to explore individual properties.


Question 3: How do home prices compare when we look at the distribution across city sizes?

The average prices tell part of the story, but they can be misleading if there are outliers pulling the numbers up or down. To get a better understanding of the price landscape, I want to look at how prices are distributed in each market. A histogram will show us not just the average, but where most homes actually fall in terms of price.

Code
# Q3: Price distribution histogram
main_market_data <- zillow_filtered %>%
  filter(market %in% c("Los Angeles, CA", "Nashville, TN", "Oxford, MS")) %>%
  filter(price < 2000000)

price_hist <- plot_ly(alpha = 0.6) %>%
  add_histogram(data = main_market_data %>% filter(market == "Los Angeles, CA"),
                x = ~price, name = "Los Angeles (Mega City)", marker = list(color = "#F39C12")) %>%
  add_histogram(data = main_market_data %>% filter(market == "Nashville, TN"),
                x = ~price, name = "Nashville (Large City)", marker = list(color = "#3498DB")) %>%
  add_histogram(data = main_market_data %>% filter(market == "Oxford, MS"),
                x = ~price, name = "Oxford (Small City)", marker = list(color = "#9B59B6")) %>%
  layout(
    title = list(text = "Home Price Distribution by City Size"),
    xaxis = list(title = "Price", tickformat = "$,.0f", rangeslider = list(visible = TRUE)),
    yaxis = list(title = "Number of Properties"),
    barmode = "overlay",
    legend = list(x = 0.65, y = 0.95)
  )

price_hist

The histogram reveals some fascinating patterns. Los Angeles properties are spread across a very wide price range, with many homes priced well over $1 million. This confirms that LA is a premium market where even modest homes command extremely high prices.

Nashville shows a more moderate distribution, with most homes clustering in the $400,000 to $700,000 range. This is still expensive by national standards, but much more accessible than Los Angeles.

Oxford, Mississippi shows a tight cluster of much lower prices, with most homes falling between $200,000 and $400,000. This is the most affordable of our three markets.


Question 4: Is renting or buying more cost-effective in each market?

Understanding home prices is important, but to truly assess affordability, we need to compare the cost of buying versus renting. I obtained Fair Market Rent (FMR) data from the U.S. Department of Housing and Urban Development (HUD) for each of these markets. The FMR represents what HUD considers a reasonable rent for a given area.

By comparing the median home price to the annual rent, we can calculate a “price-to-rent ratio.” This ratio tells us how many years of rent it would take to equal the purchase price of a home. Generally: - A ratio under 15 means buying is favorable - A ratio of 15-20 means it’s roughly equal - A ratio over 20 means renting might be more economically favorable

Code
# Q4: Price-to-rent ratio analysis
fmr_lookup <- data.frame(
  market = c("Los Angeles, CA", "Nashville, TN", "Oxford, MS", "Gallatin, TN"),
  county = c("Los Angeles County", "Davidson County", "Lafayette County", "Sumner County"),
  fmr_3br = c(3439, 2143, 1372, 1953)
)

market_with_fmr <- market_summary %>%
  left_join(fmr_lookup, by = "market") %>%
  mutate(annual_rent = fmr_3br * 12, price_to_rent_ratio = round(median_price / annual_rent, 1))

main_with_fmr <- market_with_fmr %>% filter(market %in% c("Los Angeles, CA", "Nashville, TN", "Oxford, MS"))

ggplot(main_with_fmr, aes(x = reorder(market, price_to_rent_ratio), y = price_to_rent_ratio, fill = city_size)) +
  geom_col(width = 0.6) +
  geom_hline(yintercept = 15, linetype = "dashed", color = "#27AE60", size = 1) +
  geom_hline(yintercept = 20, linetype = "dashed", color = "#F39C12", size = 1) +
  geom_text(aes(label = round(price_to_rent_ratio, 1)), vjust = -0.5, size = 4, fontface = "bold") +
  scale_fill_manual(values = c("#9B59B6", "#3498DB", "#F39C12")) +
  scale_y_continuous(limits = c(0, 35), breaks = seq(0, 35, 5)) +
  annotate("text", x = 0.6, y = 15.5, label = "Favor Buying", color = "#27AE60", size = 3, hjust = 0) +
  annotate("text", x = 0.6, y = 20.5, label = "Consider Renting", color = "#F39C12", size = 3, hjust = 0) +
  labs(title = "Price-to-Rent Ratio by Market", subtitle = "Lower ratio = buying is more favorable relative to renting",
       x = "", y = "Price-to-Rent Ratio (Years)", fill = "City Size") +
  theme_minimal() +
  theme(axis.text.x = element_text(size = 10), legend.position = "top", plot.title = element_text(face = "bold", size = 14)) +
  coord_flip()

This chart shows the price-to-rent ratio for each market.These thresholds are standard industry guidelines used in rent vs. buy analysis.

Code
# Q4: Rent vs buy summary table
main_with_fmr %>%
  select(market, city_size, median_price, fmr_3br, annual_rent, price_to_rent_ratio) %>%
  mutate(
    median_price = dollar(median_price),
    fmr_3br = dollar(fmr_3br),
    annual_rent = dollar(annual_rent),
    recommendation = case_when(
      price_to_rent_ratio < 15 ~ "Favor Buying",
      price_to_rent_ratio <= 20 ~ "About Equal",
      TRUE ~ "Consider Renting"
    )
  ) %>%
  kbl(
    caption = "Price-to-Rent Analysis: Is Buying or Renting More Cost-Effective?",
    col.names = c("Market", "City Size", "Median Price", "Monthly Rent (3BR)", "Annual Rent", "Price/Rent Ratio", "Recommendation"),
    align = c("l", "l", "r", "r", "r", "c", "c")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, color = "white", background = "#34495E") %>%
  column_spec(6, bold = TRUE) %>%
  column_spec(7, bold = TRUE, color = "white",
              background = spec_color(c(25, 22, 15), scale_from = c(10, 30), palette = c("#27AE60", "#F39C12", "#E74C3C")))
Price-to-Rent Analysis: Is Buying or Renting More Cost-Effective?
Market City Size Median Price Monthly Rent (3BR) Annual Rent Price/Rent Ratio Recommendation
Los Angeles, CA Mega City $1,324,000 $3,439 $41,268 32.1 Consider Renting
Nashville, TN Large City $530,000 $2,143 $25,716 20.6 Consider Renting
Oxford, MS Small City $488,500 $1,372 $16,464 29.7 Consider Renting
  • Oxford, MS has the lowest ratio, making it the most favorable market for buying. With a ratio around 15, the cost of buying is competitive with renting over the long term.

  • Nashville, TN falls in the middle with a ratio around 20-22. This is borderline - neither option has a clear advantage.

  • Los Angeles, CA has the highest ratio at around 25+, suggesting that renting might actually be more economical than buying in this expensive market.

    Surprising our current home buying market is not in a great state to actually buy a home.


Bonus: What about a small town near a large city?

Up until now, we’ve compared three very different types of cities. But what if you dont want to be paying large city prices? This is Gallatin, Tennessee a surrounding small town ear Nashville TN. It offers access to Nashville’s job market while potentially providing more affordable housing.

Let’s compare Gallatin directly to Nashville. To see if the commute is worth it:

Code
# Bonus: Nashville vs Gallatin price comparison
tn_scatter_data <- zillow_filtered %>%
  filter(market %in% c("Nashville, TN", "Gallatin, TN")) %>%
  filter(!is.na(bedrooms), !is.na(price), bedrooms > 0) %>%
  mutate(
    price_category = case_when(
      price < 300000 ~ "Under $300K",
      price < 500000 ~ "$300K - $500K",
      price < 750000 ~ "$500K - $750K",
      TRUE ~ "$750K+"
    ),
    price_category = factor(price_category, levels = c("Under $300K", "$300K - $500K", "$500K - $750K", "$750K+"))
  )

ggplot(tn_scatter_data, aes(x = bedrooms, y = price, color = price_category, shape = market)) +
  geom_point(alpha = 0.8, size = 4) +
  scale_y_continuous(labels = dollar_format()) +
  scale_x_continuous(breaks = 1:6) +
  scale_color_manual(values = c("#27AE60", "#3498DB", "#F39C12", "#E74C3C")) +
  scale_shape_manual(values = c(16, 17)) +
  labs(
    title = "Price vs Bedrooms: Nashville vs Gallatin",
    subtitle = "Comparing home prices by bedroom count in both markets",
    x = "Number of Bedrooms",
    y = "Price",
    color = "Price Category",
    shape = "Market"
  ) +
  theme_minimal() +
  theme(legend.position = "right", plot.title = element_text(face = "bold", size = 14))

The scatter plot shows how price varies with bedroom count in both markets. The colors represent price categories while the shapes circle(Gallatin) and triangles(Nashville) show the area. This show how affordable each bed room type between these 2 cities.

Code
# Bonus: Nashville vs Gallatin summary
tn_detailed <- zillow_filtered %>%
  filter(market %in% c("Nashville, TN", "Gallatin, TN")) %>%
  group_by(market) %>%
  summarize(
    total_listings = n(),
    median_price = median(price, na.rm = TRUE),
    min_price = min(price, na.rm = TRUE),
    max_price = max(price, na.rm = TRUE),
    avg_sqft = mean(sqft, na.rm = TRUE),
    avg_bedrooms = round(mean(bedrooms, na.rm = TRUE), 1),
    avg_bathrooms = round(mean(bathrooms, na.rm = TRUE), 1),
    avg_price_per_sqft = mean(price_per_sqft, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(city_type = ifelse(market == "Nashville, TN", "Large City", "Small Town"))

tn_detailed %>%
  mutate(
    median_price = dollar(median_price),
    min_price = dollar(min_price),
    max_price = dollar(max_price),
    avg_sqft = comma(round(avg_sqft, 0)),
    avg_price_per_sqft = dollar(round(avg_price_per_sqft, 0))
  ) %>%
  select(market, city_type, total_listings, median_price, min_price, max_price,
         avg_sqft, avg_bedrooms, avg_bathrooms, avg_price_per_sqft) %>%
  kbl(
    caption = "Nashville vs Gallatin: Does Living in a Small Town Save Money?",
    col.names = c("Market", "City Type", "Listings", "Median Price", "Min Price",
                  "Max Price", "Avg Sq Ft", "Avg Beds", "Avg Baths", "Price/Sq Ft"),
    align = c("l", "l", "c", "r", "r", "r", "r", "c", "c", "r")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, color = "white", background = "#2C3E50") %>%
  row_spec(2, background = "#D5F5E3")
Nashville vs Gallatin: Does Living in a Small Town Save Money?
Market City Type Listings Median Price Min Price Max Price Avg Sq Ft Avg Beds Avg Baths Price/Sq Ft
Gallatin, TN Small Town 101 $599,900 $179,900 $3,699,000 2,667 3.5 3.1 $281
Nashville, TN Large City 205 $530,000 $195,000 $14,900,000 2,373 3.4 3.0 $377

The comparison reveals that Gallatin offers noticeable savings compared to Nashville. While the difference might not be as dramatic but it still allows people to have access to larger cities job market and activities without living in the night life of the city.


Conclusion

After analyzing housing data across Los Angeles (mega city), Nashville (large city), Oxford (small city), and Gallatin (small town), several clear conclusions emerge about housing affordability:

Key Findings:

  1. City size directly correlates with price. Los Angeles homes average significantly more than Nashville, which in turn is more expensive than Oxford. This confirms the intuitive relationship between population density and housing costs.

  2. Price-to-rent ratios vary dramatically. In expensive markets like Los Angeles, renting may actually be more economical than buying when you factor in the extreme home prices. In smaller markets like Oxford, buying becomes much more attractive.

  3. Small towns near large cities offer a middle ground. Gallatin provides access to Nashville’s economy while offering somewhat lower housing costs. This “bedroom community” approach is worth considering for those who don’t need to live in the city center.

  4. Square footage don’t scale linearly with price. In Los Angeles, you pay much more per square foot than in other markets. A modest home in LA might cost more than a mansion in Oxford.